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ABSTRACT 

A  new  general  parallel  algorithmic  technique  for  computations  on  trees  is  present- 
ed. The  new  technique  performs  a  reduction  of  the  tree  expression  evaluation  prob- 
lem to  list  ranking;  then,  the  list  ranking  provides  a  schedule  for  evaluating  the  tree 
operations.  The  technique  needs  logarithmic  time  using  an  optimal  number  of  pro- 
cessors and  has  applications  to  other  tree  problems. 

This  new  technique  enables  us  to  systematically  order  four  basic  ideas  and  tech- 
niques for  parallel  algorithms  on  tree:  (1)  The  list  ranking  problem.  (2)  The  Euler 
tour  technique  on  trees.  (3)  The  centroid  decomposition  technique.  (4)  The  new 
accelerated  centroid  decomposition  (ACD)  technique. 

1.   Introduction 

The  model  of  parallel  computation  used  in  this  paper  is  the  concurrent-read  exclusive- 
write  (CREW)  parallel  random  access  machine  (PRAM).  A  PRAM  employs  p  synchro- 
nous processors  all  having  access  to  a  common  memory.  A  CREW  PRAM  allows  con- 
current access  by  several  processors  to  the  same  common  memory  location  for  read  but  not 
for  write  purposes.    See  [V-83a]  for  a  survey  of  results  concerning  PRAMs. 

Let  Seq(n)  be  the  fastest  known  worst-case  running  time  of  a  sequential  algorithm, 
where  n  is  the  length  of  the  input  for  the  problem  at  hand.  Obviously,  the  best  upper 
bound  on  the  parallel  time  achievable  using  p  processors,  without  improving  the  sequential 
result,  is  of  the  form  0(Seq(n)/p).  A  parallel  algorithm  that  achieves  this  running  time  is 
said  to  have  optimal  speed-up  or  more  simply  to  be  optimal. 

This  paper  deals  with  the  general  topic  of  parallel  tree  algorithms.  It  provides  two 
kinds  of  contributions.  One  is  methodological  and  the  other  is  a  concrete  complexity 
result.  Specifically,  we  suggest  a  methodological  order  for  the  fundamental  parallel  algo- 
rithmic techniques  which  relate  to  trees.  This  order  identifies  the  list  ranking  problem  as 
most  basic.  Second  in  this  order  is  the  Euler  tour  technique  (of  [TV-85]  and  [Vi-85]), 
which  is  based  on  list  ranking.  It  provides  a  solution  for  many  tree  problems.  Third  is  the 
centroid  decomposition  idea  and  the  simple  O(log^n)  time  parallel  algorithm  it  implies. 
This  idea  was  mentioned  in  [Me-83]  among  several  others.  So  far  nothing  new  has  been 
added. 

[MR-85]  gave  an  elegant  parallel  algorithmic  technique  for  a  class  of  tree  problems 
including  evaluation  of  an  arithmetic  expression.  Their  contribution  is  not  only  to  provide 
a  new  technique  but  also  to  characterize  the  class  of  problems  that  it  can  solve.  Our  new 


technique  benefits  greatly  from  their  characterization. 

We  propose  an  alternative  accelerated  centroid  decomposition  (ACD)  technique  which 
solves  the  same  class  of  problems  using  only  the  centroid  decomposition  idea  and  any 
appropriate  parallel  list  ranking  algorithm.  Such  a  list  ranking  algorithm  is  used  both 
indirectly,  through  the  Euler  tour  technique,  and  directly.  In  other  words,  we  solve  this 
class  of  problems  by  a  logarithmic  time  optimal  parallel  reduction  into  the  list  ranking  prob- 
lem. Since  [CV-86b]  provides  a  logarithmic  time  optimal  parallel  list  ranking  algorithm, 
the  new  accelerated  centroid  decomposition  technique  also  achieves  this  desired  efficiency! 
Recall  that  [MR-85]  gave  a  deterministic  logarithmic  time  parallel  algorithm  using  a  linear 
number  of  processors  and  a  randomized  logarithmic  time  optimal  parallel  algorithm.  Our 
technique  is  considerably  simpler  than  this  randomized  algorithm.  Another  interesting 
difference  with  respect  to  the  Miller-Reif  technique  is  that  they  characterize  their  technique 
as  a  dynamic  expression  evaluation  with  no  preprocessing.  Our  approach  is  to  analyze  the 
structure  of  the  tree  first.  This  is  done  efficiently  using  known  methods.  Later  we  use  this 
analysis  to  schedule  the  evaluation  part  properly.  Interestingly,  the  overall  efficiency  of  our 
approach  compares  favorably  with  theirs.  So  it  is  "evaluate  on-the-fly"  versus  "analyze- 
first  evaluate-later". 

To  sum  up.  The  incremental  contribution  of  this  paper  is  in  presenting  the  fundamen- 
tal parallel  algorithmic  techniques  for  trees  in  a  "textbook  like"  fashion.  Specifically,  we 
show  how  to  derive  the  more  involved  accelerated  centroid  decomposition  technique  from 
the  more  elementary  techniques,  namely  the  centroid  decomposition  technique,  the  Euler 
tour  technique  and  list  ranking.  Also,  as  we  mentioned  above,  the  Euler  tour  technique  is 
based  on  list  ranking. 

2.   Preliminaries 

The  (extended)  list  ranking  problem. 

Input:  n  nodes  each  occupying  an  entry  of  an  array  whose  size  is  n.  Each  node  has  at  most 
one  successor  in  the  array.  A  node  having  a  successor  has  the  array  index  of  this  successor. 
Consider  a  path  that  starts  from  any  node  and  follows  the  successor  relation.  We  assume 
that  such  a  path  never  provides  a  circuit.    Each  node  v  has  a  weight  w(v). 

The  problem:  For  each  node  compute  the  total  weight  of  the  nodes  following  it  in  its  linked 
list. 
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Recently,  [CV-86b]  gave  a  logarithmic  time  parallel  algorithm  for  this  problem  which 
uses  an  optimal  number  of  processors  (n/logn).  The  present  paper  relies  on  this  algo- 
rithm.  Note  that  for  all  practical  purposes  we  could  have  used  the  0(log  n  log  n)  time 
parallel  algorithm  of  [CV-86a]  which  also  uses  an  optimal  number  of  processors  and  has 
the  advantage  that  the  constants  which  are  hidden  by  the  "Big  Oh"  notation  are  quite  mod- 
est, by  contrast  with  [CV-86b];  incidentally,  it  is  easy  to  see  that  for  n  <  2^^°°°,  log'n  <  5 
(see  [AHU-74],  p.  133). 

The  Euler  tour  technique  on  trees.  Let  T  =  (V,E)  be  a  rooted  tree,  where  r  is  its 
root.  For  each  node  v#=r,  there  is  a  single  edge  of  the  form  v  -  u;  u  is  the  parent  of  v. 
Step  1 .  For  each  edge  u  -  v  in  T  add  its  anti-parallel  edge  v  -  u.  Denote  the  new  graph  H. 
Since  in  — degree(v)  =  out  — degree(v)  for  every  node  v,  H  has  an  Euler  path  that  starts  and 
ends  at  the  root  r.  Step  2  computes  this  path  into  the  vector  of  pointers  D,  where  for  each 
edge  e  of  H ,  D(e)  will  have  the  successor  edge  of  e  in  the  Euler  path. 

Step  2.  For  each  node  v  of  H  we  do  the  following.    (Let  out— degree(v)  =  d  in  H  and  let 
the  outgoing  edges  of  v  be  v  -  ui,.,v  -  u^). 
D(Ui  -  v)  :=  V  -  Uj+i  mod  d  for  1  <  i  <  d. 

Now  D  has  an  Euler  circuit.  The  "correction", 

D(Ud  -r)  :=  end  of  list        (where  d  =  in  — degree(r))  gives  an  Euler  path  that  starts  and 

ends  at  r. 

Next,  we  show  how  to  use  this  Euler  path  in  order  to  find  for  each  node  v  the  number  of 

nodes  in  the  subtree  rooted  at  v,  denoted  SIZE(v). 

Step  3. 

Initialize: 

for  each  e  in  H  —  T  pardo 

R(e):=l 
for  each  e  in  T  pardo 

R(e)  :=  0 
In  words:  if  e  in  H  is  the  incoming  edge  of  a  node  in  the  direction  from  the  root  then  its 
weight  is  one  and  otherwise  it  is  zero. 

Apply  a  list  ranking  algorithm  to  find  for  each  e  in  H  its  (weighted)  distance  to  the  end  of 
the  Euler  path,  storing  the  result  in  DISTANCE (e). 
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For  each  v=i^r  consider  the  edge  e  =  v  -u  in  T  where  u  is  the  parent  of  v,  and  let 
e  =  u  -  V  be  the  anti-parallel  edge  of  e  in  H.  It  is  not  difficult  to  observe  that, 

SIZE(v)  :=  DISTANCE(e)  -  DISTANCE(e)  . 
Also, 

SIZE(r)  :=  "total  weighted  length  of  the  Euler  path"  +  1 

Centroid  decomposition 

Let  T  =  (V,E)  be  a  rooted  tree,  where  r  is  its  root.  Let  n  =  Iv].  Recall  that  SIZE(v) 
is  the  number  of  nodes  in  the  subtree  of  T  rooted  at  v.  Let  the  centroid  level  of  v,  denoted 
C-LEVEL(v),be  [log  SIZE(v)l.* 

Observation.  Each  node  v  has  at  most  one  child  u  such  that 
C-LEVEL(u)  =  C-LEVEL(v). 

Accordingly,  we  define  the  centroid  path  of  v  to  be  the  longest  directed  path  of  nodes  and 
tree  edges,  passing  through  v,  where  all  the  nodes  on  the  path  have  the  same  centroid  level 
as  V.  We  note  that  there  might  be  several  disjoint  centroid  paths  of  the  same  centroid 
level. 

This  partition  of  the  nodes  of  T  into  centroid  paths  is  called  the  centroid  decomposition 
of  T.  See  Fig.  1.  There  are  a  few  alternative  definitions  for  the  centroid  decomposition 
notion.    One  such  definition  is  given  in  Comment  2  in  the  next  section. 

[Me-83]  considers  problems  on  rooted  trees  for  which  there  are  serial  algorithms  that 
consist  of  moving  from  the  leaves  towards  the  root  of  the  tree.  It  requires  that  these  algo- 
rithms run  fast  in  parallel,  say  0(log  |V  |)  time,  for  linear  trees,  i.e.,  trees  that  consist  of 
one  simple  path.  Megiddo  suggests  considering  the  following  0(log^|V|)  time  scheme  for 
extending  such  linear  tree  algorithms  into  general  trees: 
for  i  :=  0  to  flogn]  pardo 

for  each  centroid  path  at  level  i  do 

apply  an  adaptation  of  the  linear  tree  algorithm 
od 
odpar 


•  The  base  of  all  logarithms  in  this  paper  is  two. 
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Note  that  nodes  in  centroid  paths  of  level  i  are  not  being  considered  by  this  frame- 
work until  the  treatment  of  all  centroid  paths  of  lower  levels  is  finished.  In  contrast,  the 
new  method  works  on  nodes  of  centroid  path  i  in  parallel  to  working  on  nodes  of  lower 
centroid  paths.  Therefore,  we  call  the  new  method  the  accelerated  centroid  decomposition 
(ACD)  method. 

3.  The  accelerated  centroid  decomposition  method 

Essentially,  each  step  of  the  ACD  method  consists  of  applying,  in  parallel,  several 
operations  that  reduce  the  tree  at  hand.  Given  an  input  tree,  the  method  performs  these 
tree  reduction  steps  so  that  we  end  up  with  only  the  root.  (Later  we  discuss  the  applicabil- 
ity of  ACD  for  parallel  algorithms  that  obey  the  above  centroid  decomposition  scheme.) 

The  ACD  method  allows  two  kinds  of  operations  (for  simplicity,  below,  we  consider 
only  binary  input  trees):  PRUNE  -  Node  v  becomes  ready  for  the  PRUNE  operation  as 
soon  as  it  becomes  a  leaf.  Applying  PRUNE  to  v  results  in  eliminating  v  from  the  tree. 
For  some  applications,  it  may  be  illegal  to  perform  simultaneous  PRUNE  operations  on 
both  children  of  a  given  node;  thus  later  comments  explain  how  our  solution  can  be  made 
to  avoid  simultaneous  PRUNEs.  SHORTCUT  -  Let  v  be  a  node  whose  centroid  level  in 
the  input  tree  is  i  and  suppose  that  v  is  not  an  extreme  node  in  its  centroid  path  (i.e.,  its 
centroid  path  contains  both  its  parent  and  one  of  its  children,  to  be  called  its  centroid 
child).  Node  v  becomes  ready  for  SHORTCUT  after  losing  its  non  centroid  child  (the 
method  implies  that  when  this  happens  v  must  still  have  a  child  with  the  same  centroid 
level  as  itself).  Applying  SHORTCUT  to  v  results  in  eliminating  v  so  that  its  parent 
becomes  the  parent  of  its  child.  It  is  illegal  to  perform  simultaneously  two  SHORTCUT 
operations  both  at  a  node  and  at  its  child.  RAKE  and  COMPRESS  are  the  names  that 
were  used  in  the  original  paper  of  [MR-85]  for  operations,  similar,  though  not  identical,  to 
PRUNE  and  SHORTCUT,  respectively.  This  non-identicality  was  essential  for  our  presen- 
tation and,  therefore,  we  use  different  names. 

The  reader  may  find  our  description  rather  abstract.  Therefore,  it  might  be  helpful  to 
consider  the  following  example  throughout  this  presentation.  Recall  the  minimum  vertex 
cover  (MVC)  problem.  Given  a  graph  G(V,E),  a  subset  V  of  vertices  is  a  vertex  cover  of  G 
if  one  endpoint  of  each  edge  in  E  is  in  V.  The  MVC  problem  is  to  find  a  vertex  cover 
whose  cardinality  is  minimum.   This  problem  is  NP-complete  for  general  graphs  but  has  an 
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obvious  "leaf-to-root"  linear  time  serial  algorithm  for  rooted  trees:  Every  leaf  is  not 
selected  for  the  MVC.  This  implies  that  each  vertex  adjacent  to  a  leaf  is  selected  for  the 
MVC.  Next  remove  all  the  vertices  that  have  been  selected  to  be  either  inside  or  outside 
the  MVC  and  their  adjacent  edges  from  the  graph.  We  get  one  or  more  rooted  trees  and 
iterate  this  procedure  for  each  of  them.  Nodes  that  become  singletons  are  not  selected  for 
the  MVC.    It  is  easy  to  show  that  this  vertex  cover  is  indeed  of  minimum  cardinality. 

Next,  we  demonstrate  PRUNE  for  the  MVC  problem.  Consider  a  vertex  v  to  which 
PRUNE  should  be  applied,  v  is  a  leaf.  If  v  has  not  been  selected  for  the  MVC  then  remove 
v  and  select  its  parent  for  the  MVC.  If  v  was  selected  then  simply  remove  v. 

We  demonstrate  SHORTCUT  for  the  MVC  problem.  Let  v  be  a  node  of  some  cen- 
troid  path  whose  non  centroid  child  x  was  removed.  If  x  was  placed  in  the  MVC  then  so 
far  we  have  not  been  obliged  to  add  v  to  the  MVC,  while  if  x  was  not  placed  in  the  MVC 
then  we  are  obliged  to  add  v  to  the  MVC.  Thus  there  are  three  possibilities:  1.  v  is  in  the 
MVC.  2.  Based  on  its  descendents  in  its  centroid  path  v  is  not  in  the  MVC.  Using  the 
notion  of  motion  from  the  leaves  to  the  root  and  the  idea  of  including  the  furthest  vertex 
possible  in  the  MVC  we  conclude  (upon  performing  SHORTCUT  at  v)  that  v  is  not  in  the 
MVC.    3.  Neither  1  nor  2  (namely,  we  do  know  yet  whether  v  is  in  the  MVC). 

In  order  to  understand  the  situation  better  let  us  be  more  specific.  Consider  an 
instance  where  the  whole  tree  is  one  directed  path  (linear  tree).  Let  u  be  the  parent  of  v, 
and  w  be  the  (only)  child  of  v.  Then  the  "spirit"  of  the  serial  algorithm  implies  the  follow- 
ing two  dependencies:  (a)  u  is  in  the  MVC  if  and  only  if  v  is  not,  and  (b)  v  is  in  the  MVC 
if  and  only  if  w  is  not.  This  enables  us  to  do  the  following  SHORTCUT:  remove  v, 
"remember"  dependency  (b),  and  compose  (a)  and  (b)  into  the  following  dependency:  (c) 
u  is  in  the  MVC  if  and  only  if  w  is.  Dependency  (b)  provides  a  pointer  from  v  to  w.  Simi- 
larly, a  later  SHORTCUT  may  remove  w  and  provide  w  with  a  pointer  to  a  vertex  which 
has  not  yet  been  removed  and  so  on.  This  linked  list  of  pointers  starting  from  v  must  lead 
to  a  vertex  whose  membership  in  the  MVC  was  determined  after  all  SHORTCUTS  and 
PRUNES  are  over.  Determining  the  membership  of  all  vertices  in  the  list  is  done  by  back- 
tracking. We  finish  with  this  instance  of  linear  trees  by  observing  that  a  pair  of  dependen- 
cies of  type  (a)  (opposite  participation  in  MVC)  or  of  type  (c)  (same  participation  in 
MVC)  can  be  composed  in  a  similar  way  to  provide  a  dependency  of  one  of  these  two 
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forms. 

We  return  to  the  MVC  problem  on  general  trees.  As  with  linear  trees  we  initially 
assign  opposite  participation  dependency  to  any  pair  of  successive  vertices  in  the  same  cen- 
troid  path,  ignoring  edges  between  vertices  whose  centroid  level  is  different.  Let  u  be  the 
parent  of  v  and  w  its  child  in  the  present  tree  and  suppose  that  the  centroid  level  of  both  u 
and  w  are  the  same  as  the  centroid  level  of  v.  (We  never  perform  SHORTCUT  if  this 
assumption  does  not  hold).  Also,  recall  that  we  assumed  v  no  longer  has  a  non  centroid 
child.  If  possibility  3  holds  we  compose  the  dependencies  between  u  and  v  and  v  and  w 
into  a  dependency  between  u  and  w,  thereby  making  w  a  child  of  u.  We  also  remove  v  and 
"remember"  the  dependency  between  v  and  w  in  order  to  be  able  to  determine  later 
whether  v  is  in  the  MVC.  Suppose  that  either  possibility  1  or  2  holds.  In  this  case,  our 
general  method  becomes  redundant  when  it  comes  to  the  relation  between  u  and  w.  It  also 
makes  w  a  child  of  u.  However,  this  has  no  algorithmic  meaning  since  there  is  no  non- 
trivial  dependency  between  w  and  u.  (The  reader  should  realize  that  the  problem  con- 
sidered here  is  perhaps  one  of  the  easiest  to  be  solved  by  the  ACD  method  and  therefore  a 
few  cases  are  degenerate  and  do  not  need  the  full  power  of  the  method).  Finally,  we  con- 
sider direct  implications  on  u.  If  we  already  know  that  u  is  in  the  MVC  then  we  are  in  a 
degenerate  case.  Otherwise,  using  the  dependency  between  u  and  v,  we  either  decide  that 
u  is  in  the  MVC  or  that:  "based  on  its  descendents  in  its  centroid  path  u  need  not  be  in  the 
MVC". 

The  new  method  actually  has  three  stages:  Preparatory  stage.  Scheduling  stage  and 
Evaluation  stage. 

The  Preparatory  Stage.  The  input  for  this  stage  is  a  rooted  tree  where  each  node  has  a 
hnked  list  of  its  children  (incoming  edges)  and  its  parent  (outgoing  edge).  The  following 
things  are  being  computed: 

1.  The  tree  is  "binarized":  For  each  node  that  has  more  than  two  children  replace  its 
incoming  edges  by  a  binary  tree  where:  1.  each  child  is  a  leaf  in  this  binary  tree;  2.  the 
node  is  the  root;  and  3.  if  all  leaves  are  pruned,  the  tree  comprises  a  path.  See  Fig.  2. 
Remark.  It  is  simple  to  adapt  now  the  MVC  implementation  to  this  binarization.  Each  inter- 
nal node  in  this  binary  tree  has  two  kinds  of  children:  an  auxiliary  node  or  leaves.  The 
adaptation    consists    of    the    following    initialization:     1.    Assign    opposite    participation 
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dependency  to  an  internal  node  with  respect  to  a  leaf  child.  2.  Assign  the  same  participa- 
tion dependency  to  an  internal  node  with  respect  to  a  child  which  is  another  internal  node. 
It  is  not  difficult  to  verify  that  this  solves  the  MVC  problem. 

In  general,  the  binarization  may  need  some  ad  hoc  effort  which  differs  from  one  applica- 
tion to  the  next. 

2.  SIZE(v).  SIZE(v)  is  the  number  of  nodes  in  the  tree  rooted  at  v,  for  each  node  v. 

3.  TAIL(v),  HEAD(v)  and  RANK(v).  Consider  the  centroid  path  that  contains  v. 
TAIL(v)  (resp.  HEAD(v)  )  is  the  furthest  descendent  (resp.  ancestor)  of  v  on  this  path. 
RANK(v)  is  the  length  of  the  path  from  v  to  HEAD(v). 

Let  vi,V2,.,v„+i  be  a  centroid  path  at  level  i,  where  vj  is  the  head  of  the  path  and  V(,  +  i 
is  the  tail  of  the  path.  For  reasons  that  will  become  clear  later,  we  focus  only  on  nodes 
vi,.,v„.  Let  SIZEj  be  SIZE(vj)  -  SIZE(Vj+i)  ,  1  <  j  <  a.  In  words,  SIZEj  is  the  size  of 
the      tree      rooted      at      Vj      excluding      the      tree      rooted      at      Vj+i.       Observe      that 

^     SIZEj  =  SIZE(vi)  -  SIZE(vJ. 
j=  1 

k  a  . 

4.  All  a  prefix  sums  A^,  =    ^     SIZEj,  for  1  <  k  <  a.    Thus    ^     SIZEj  <  2\ 

j=  1  j=  1 

The  Main  (Scheduling)  Stage.  This  stage  is  devoted  to  scheduling  each  node  of  the  tree 
for  either  PRUNE  or  SHORTCUT.  We  provide  an  inductive  description  of  the  Main 
Stage.  We  will  schedule  the  removal  of  each  node  of  the  tree  and  decide  whether  the  remo- 
val is  done  by  PRUNE  or  SHORTCUT.  At  the  end  we  explain  how  to  assign  processors  to 
the  operations  of  each  time  unit.  Our  inductive  description  will  make  use  of  the  following 
inductive  claim  regarding  the  reduced  tree  following  time  T(i),  where  i  is  the  centroid  level 
and  T  is  some  integer  valued  function. 
The  Inductive  Claim. 

A.  The  reduced  tree  at  the  end  of  time  T(i)  satisfies  the  following:  1.  Every  node,  whose 
centroid  level  is  <  i,  has  been  removed.  2.  Every  node,  whose  centroid  level  is  i  and  is 
not  the  head  of  its  centroid  path,  has  been  removed.  3.  Every  node,  whose  centroid  level 
is  i  and  is  the  head  of  its  centroid  path,  is  a  leaf. 

B.  The  reduced  tree  at  the  end  of  time  T(i)  +  2  satisfies  the  following:  any  centroid  path, 
whose  level  is  i+  1,  that  contains  at  least  two  nodes  (in  the  input  tree)  has  two  nodes  which 
have  not  yet  been  removed.  These  two  nodes  are  the  tail  and  head  of  the  centroid  path. 
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Next,  we  assume  the  Inductive  Claim  for  i  and  show  how  to  satisfy  it  for  i+  1  within  4 
time  steps  past  time  T(i).  This  clearly  implies  that  T(i)  =  4i.  We  will  see  that  this  also 
implies  a  total  of  0(log  n)  time  for  the  Evaluation  Stage  that  follows  and  for  the  whole 
algorithm. 

For  i  =  0  the  Inductive  Claim  holds  trivially.  Assume  the  Inductive  Claim  holds  for  i. 
We  describe  iteration  i+1  of  the  Main  Stage.  It  is  devoted  to  satisfying  the  Inductive  Claim 
for  i+1.  Satisfying  item  A  of  the  Inductive  Claim  for  i+1  is  very  simple:  Recall  that  item  B 
for  i  implies  that  after  time  T(i)  +  2  each  path  of  centroid  level  i+1  remains  with  at  most 
two  nodes.  In  case  any  of  these  two  nodes  has  a  child  whose  centroid  level  is  <  i+1  then 
such  a  child  must  be  a  leaf  whose  centroid  level  is  i  (by  A3  for  i). 

Our  solution  for  satisfying  item  A  is  as  follows: 

Schedule  two  serial  steps  of  PRUNE:  one  at  time  T(i)  +  3  and  the  second  at  time  T(i)  +  4 

(as  defined  by  Fig.  3). 

The  rest  of  this  description  is  devoted  to  satisfying  item  B  of  the  Inductive  Claim  for 
i+1.  Consider  the  tree  rooted  at  the  head  of  the  centroid  path  of  level  i  +  2  excluding  the 
subtree  rooted  at  the  tail  (See  Fig.  1).  Each  node  of  the  centroid  path  (recall  that  the  tail  is 
excluded  in  this  part)  has  a  centroid  child  and  perhaps  another,  non-centroid,  child.  We 
would  like  to  schedule  the  SHORTCUT  operations  to  be  applied  to  the  nodes  on  the  cen- 
troid path.  However,  the  fact  that  different  nodes  of  the  centroid  path  become  ready  for 
SHORTCUT  at  different  times  complicates  the  situation. 

Our  algorithm  uses:  1.  The  inductive  assumption  that  if  the  tree  rooted  at  any  non- 
centroid  child  has  x  nodes  then  either  this  child  has  been  removed  or  it  is  a  leaf  at  time 
T(  flog  x]).  2.  The  fact,  mentioned  above,  that  the  total  number  of  descendants  of  the  cen- 
troid path  of  level  i  +  2  (excluding  the  tail)  is  <2''^^  3.  The  data  which  were  computed  in 
the  Preparatory  Stage. 

Let  us  focus  on  some  centroid  path  at  level  i  +  2:  Vi,V2,.,Vm.l.  Since  item  B  does  not 
imply  anything  regarding  the  subtree  rooted  at  v^  +  j,  we  focus  further  only  on  nodes 
vi,.,v„.  Let  Uj  be  the  non  centroid  child  of  Vj,  for  1  ^  j  ^  a.  (Node  Vj  need  not  have  a  non 
centroid    child.)      Let    SIZEj    and    their     prefix    sums    Aj    be    as    above.     Recall    that 

2  SIZEj  <  2*+\ 
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By     the     inductive     hypothesis,     node     Uj  ,  1  <  j  <  a,     becomes     a     leaf     by     time 
tj  =  T(nogSIZE(upl). 

Our  solution  is: 

PRUNE  Uj  at  time  step  tj  +  1,  for  j  >  1. 
This  makes  node  Vj  ready  for  SHORTCUT. 

Suppose  a  ^  2  (otherwise  the  inductive  hypothesis  B  already  holds).  We  will  ensure 
that  by  the  end  of  time  T(i+1)  +  2  the  nodes  V2,.,v„  have  all  performed  SHORTCUT. 
First,  we  give  a  concise  description  of  our  solution.  Later,  we  use  some  illustrations  in 
order  to  give  a  more  intuitive  explanation.  For  each  j,  1<  j  <  a,  consider  the  binary 
representation  of  Aj_i  -  1  and  Aj  —  1.  Let  Kj  be  the  index  of  the  most  significant  bit  in 
which  these  two  binary  representations  are  different. 
(^Example.  Let  the  representations  of  Aj_i  —  1  be  011011  and  Aj  -  1  be  010101.  Then  Kj  is 

4.) 

Remarks: 

1.  For  Kj  to  be  well-defined,  we  concatenate  zeros  to  the  left  of  the  binary  representation 
of  Aj_i  -  1. 

2.  Bit  Kj  is  zero  in  (the  representation  of)  Aj_i  —  1  and  one  in  Aj  -  1.  (Since  Aj  —  1  is 
larger  than  Aj_i  —  1). 

Our  solution  is  very  simple  to  state: 
SHORTCUT  at  node  Vj,    1  <  j  <  a,  at  time  T(Kj)  +  2. 

In  order  to  establish  the  validity  of  our  solution  we  show: 
Validity  Claim  1.  We  never  SHORTCUT  at  a  node  before  it  is  ready. 

To  see  this,  observe  that  SIZEj  =  (Aj  -  1)  -  (Aj_i  -  1)  <  2  '  -  1.  By  the  inductive 
hypothesis,  if  node  Vj  has  a  non  centroid  child  at  time  T(Kj)  then  it  must  be  a  leaf  and  we 
PRUNE  it  at  time  T(Kj)  +  1. 

Validity  Claim  2.  We  never  perform  two  simultaneous  SHORTCUTS  at  two  adjacent  nodes 
of  the  path.  (That  is,  it  is  impossible  that  the  two  nodes  are  adjacent  at  the  time  at  which 
the  SHORTCUTS  are  performed.  Earlier  we  called  such  SHORTCUTS  illegal.) 
To  show  this,  we  need  the  following  observation: 
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Observation.  For  every  Kj  =  Kj,  where  1  <  j  <  1  <  a,  there  exists  Kp  >  Kj,  where 
j  <  P  <  1.  In  other  words,  given  two  nodes  Vj,  V]  at  which  simultaneous  SHORTCUTS  are 
performed  we  are  guaranteed  to  have  node  Vp  between  them  at  the  time  of  these 
SHORTCUTS,  and  therefore  they  are  legal  and  Vahdity  Claim  2  follows. 
Proof  of  Observation.  Remark  2  imphes  that  bit  Kj  at  Aj  -  1  is  1,  and  bit  Kj(  =  Kj)  at 
Aj-i  -  1  is  0.  Therefore,  for  some  j  <  x  <  1,  bit  Kj  was  1  at  A^-i  -  1  and  0  at  Ax  -  1. 
Apply  Remark  2  again  to  conclude  K^  >  Kj,  which  proves  the  Observation. 

For  readers  who  seek  a  more  intuitive  explanation  of  our  Scheduling  Stage,  we  pro- 
vide the  following  illustrative  explanation.  Consider  an  auxiliary  balanced  binary  tree 
whose  leaves  are  all  binary  strings  of  length  i+1  ordered  lexicographically.  (Which  is, 
clearly,  also  the  order  of  the  numbers  that  they  represent).  If  SIZEj  =  1  for  each  j  in  the 
path  then  we  could  have  simply  SHORTCUT  at  each  even  location  of  the  path  and  repeat- 
edly done  the  same  to  the  remaining  path.  Observe  that  this  corresponds  to  the  structure  of 
the  binary  tree.  More  generally,  we  identify  the  half  open  interval  of  leaves 
(Aj_i  —  1  ,  Aj  —  1]  with  node  Vj_  1  <  j  <  a.  Denote  by  LCAj  the  lowest  common  ances- 
tor of  leaf  Aj_i  -  1  and  leaf  Aj  —  1  in  the  binary  tree.  The  crucial  observation  is  that  Kj 
of  our  above  solution  is  exactly  the  height  of  LCAj  in  the  tree  (that  is,  the  length  of  the 
path  in  the  tree  from  LCAj  to  its  closest  leaf).  We  leave  it  to  the  interested  reader  to  ver- 
ify that  both  Validity  Claims  above  and  the  Completeness  Claim  below  can  be  proved  using 
the  insight  provided  by  this  auxiliary  binary  tree. 

Completeness  Claim.   Each  node  is  scheduled  for  removal  during  the  Main  Stage. 

Proof  of  Completeness  Claim.  Consider  node  v  whose  centroid  level  is  i+1.  If  it  is  the 
tail  of  its  centroid  path  then  it  will  be  removed  using  PRUNE  at  time  T(i)  +  4.  If  it  is  nei- 
ther a  tail  nor  a  head  then  it  will  be  removed  at  time  T(j)  +  2  for  some  j  s  i,  using 
SHORTCUT.  If  it  is  the  head  of  its  centroid  path,  there  are  two  possibilities.  Let  u  be  the 
parent  of  v  whose  centroid  level  is  j.  If  u  is  a  head  or  tail  then  v  is  removed  at  time 
T(j)  +  3,  using  PRUNE.    Otherwise,  v  is  removed  at  time  T(i)  +  1  using  PRUNE. 

Comment  1 .  If  some  application  does  not  allow  two  simultaneous  PRUNEs  at  two  chil- 
dren of  the  same  node  we  can  arbitrarily  schedule  one  before  the  other.  Since  this  may 
happen  only  at  a  time  step  of  the  form  T(.)  +  3,  this  increases  the  number  of  time  steps 
past  time  T(i)  until  time  T(i+1)  to  5  and,  therefore,  T(i)  to  5i. 
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In  the  above  presentation  our  goal  was  to  get  a  time  bound  of  the  form  0(log  n)  and 
we  did  not  bother  too  much  about  the  constant  that  multiplies  log  n.  The  authors  see  a  few 
ways  for  modifying  the  accelerated  centroid  decomposition  method  in  order  to  reduce  this 
constant.  In  the  following  comment  we  present  one  such  way  which  seems  to  be  of 
independent  interest  since  it  is  based  on  an  alternative  notion  of  centroid  decomposition. 
Our  description  focuses  on  the  changes  to  the  above  scheduling  method. 

Comment  2.  Alternative  centroid  decomposition  and  Scheduling  Stage. 

We  partition  the  nodes  of  the  tree  into  paths  called  centroid  lanes  (instead  of  centroid 
paths).  Each  internal  node  of  the  tree  selects  its  child,  whose  SIZE  is  largest,  to  be  its  sub- 
sequent in  its  centroid  path.  (In  the  event  of  a  tie  one  of  the  children  is  selected  arbitrarily). 
It  can  be  easily  verified  that:  (1)  each  centroid  lane  comprises  several  centroid  paths  of 
decreasing  levels  (2)  the  tail  of  each  centroid  lane  is  always  a  leaf.  (We  see  below  that  this 
simplifies  considerably  the  situation  described  in  Comment  1  above  as  we  do  not  need  to 
avoid  performing  simultaneous  PRUNES.)  The  level  of  a  centroid  lane  is  the  centroid  level 
of  its  head. 

Later  we  describe  an  alternative  scheduling  stage.  First,  we  mention  two  important 
differences  with  respect  to  the  previous  scheduling. 

(1)  We  process  tails  of  centroid  lanes  in  the  same  way  as  other  nodes  of  a  lane.  For  this  we 
extend  the  definition  of  Kj  above  to  K^^  +  j  in  the  natural  way.  Heads  still  get  a  special 
treatment.  We  note  that  whenever  it  is  written  below  that  SHORTCUT  should  be  applied 
to  a  leaf  in  the  reduced  tree,  then  the  convention  is  that  PRUNE  is  actually  applied. 

(2)  The  total  number  of  descendents  of  the  centroid  lane  of  level  i+1  is  ^  2''*"-'  (and  not  2' 
as  for  centroid  paths). 

The  Inductive  Claim  is:  at  the  end  of  time  T(i)  each  level  i  lane  is  reduced  to  its  head  node 
and  at  most  one  other  node. 

The  Alternative  Scheduling  Method:  Use  three  time  steps  per  level. 

Time  step  T(i)  +  1:  PRUNE  level  i  nodes  that  are  not  head  nodes. 

Time  step  T(i)  +  2:  PRUNE  level  i  head  nodes. 

Time  step  T(i)  +  3:  SHORTCUT  at  each  node  Vj  for  which  Kj  equals  i.    (Note  that  apart 

from  the  head  node  there  is  just  one  node  in  each  level  i+  1  lane  that  is  not  SHORTCUT  by 

this  step.) 
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This  finishes  Comment  2. 

The  Evaluation  Stage. 

The  only  remaining  problem  for  actual  performance  of  PRUNE  and  SHORTCUT  as 
per  the  schedule  found  in  the  previous  stage  is  that  of  assigning  processors  to  jobs.  Each 
node  knows  its  schedule  for  removal.  This  schedule  is  a  positive  number  which  is  0(log  n). 
We  sort  the  nodes  according  to  their  schedules.  Lemma  3.4  in  [R-85]  gives  a  logarithmic 
time  optimal  parallel  bucket  sort  deterministic  algorithm  (which  is  also  quoted  in  [CV-86a]) 
for  sorting  n  positive  numbers  which  are  5=  log  n.  It  is  easy  to  extend  this  for  positive 
numbers  which  are  0(log  n)  without  changing  the  time  and  processor  complexities.  We 
use  this  algorithm  to  sort  the  schedules.  The  assignment  of  any  number  of  p  ^  n/log  n 
processors  to  nodes  is  easy.  At  each  time  of  the  implementation  perform  the  next  p  jobs 
that  should  be  performed  in  the  time  unit  of  our  schedule  which  is  currently  being  imple- 
mented. If  there  are  less  than  p  remaining  jobs  perform  all  p  of  them;  this  leaves  some  of 
the  processors  idle  for  this  time  step  of  the  implementation  (see  [CV-86a]  for  details  of  a 
similar  assignment  of  processors  to  jobs). 

Complexity.  We  have  shown  that,  using  the  logarithmic  time  optimal  parallel  list  rank- 
ing algorithm,  our  ACD  method  has  the  same  efficiency.  Namely,  it  takes  logarithmic  time 
using  an  optimal  number  of  processors. 

Remarks. 
1.  [MR-85]  demonstrate  their  technique  for  the  problem  of  expression  evaluation.  Given  an 
arithmetic  expression,  the  idea  is  to  apply  the  logarithmic  time  optimal  parallel  algorithm 
of  [BV-85]  in  order  to  transform  it  into  a  computation  tree  form.  Given  a  computation 
tree,  they  show  how  to  apply  their  method  in  order  to  evaluate  it.  Note,  however,  that  in 
[MR-85]  a  node  is  ready  for  SHORTCUT  only  if  both  the  node  and  its  parent  do  not  have 
non-centroid  children.  It  is  easy  to  modify  our  algorithm  to  handle  such  a  change  in  the 
definition  of  SHORTCUT.  Simply,  let  another  variable  Sj  play  the  role  of  SIZEj  for  each  j. 
Define  Sj  =  max(SIZEj,SIZEj_i).  We  then  compute  prefix  sums  with  respect  to  Sj  and  fin- 
ish as  before,  but  with  the  following  exception.  Note  that  one  Kj  at  a  level  i  +  2  path  may 
be  i  +  2.  We  add  a  time  step  between  time  step  T(i)  +  2  and  time  step  T(i)  +  3  in  which 
we  SHORTCUT  at  each  node  Vj  of  level  i  +  2  for  which  Kj  =  i  +  2.  It  is  easy  to  see  that  this 
increases  the  time  that  elapses  between  T(i)  and  T(i+1)  by  one  time  unit.    Thus  the  total 
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running  time  is  still  0(log  n)  using  an  optimal  number  of  processors. 

2.  [GR-86]  showed  how  to  get  a  logarithmic  time  optimal  parallel  algorithm  for  the  prob- 
lem of  expression  evaluation  for  expressions  which  are  given  in  the  form  needed  for  the 
parsing  algorithm  of  [BV-85].  Their  algorithm  works  in  three  stages.  The  first  stage  simply 
applies  the  algorithm  of  [BV-85].  The  second  stage,  which  is  the  new  part,  is  a  nice  reduc- 
tion of  the  evaluation  problem  into  a  smaller  instance  of  the  evaluation  problem.  The  third 
stage  consists  of  applying  the  method  of  [MR-85]  to  this  smaller  instance  of  the  evaluation 
problem.  We  believe  that  our  method  compares  favorably  with  theirs  because  of  the  fol- 
lowing three  differences,  (a)  Their  improvement  seems  not  to  carry  through  when  the 
computation  tree  is  already  given,  (b)  Therefore,  it  does  not  generalize  to  other  tree  prob- 
lems to  which  the  method  of  [MR-85]  is  applicable,  (c)  It  essentially  applies  the  method  of 
[MR-85]  to  a  smaller  problem  instance,  while  the  introduction  explained  why  our  method 
and  presentation  are  methodologically  interesting. 

3.  Given  a  graph  we  define  a  subset  of  its  vertices  to  be  a  dominating  set  if  each  vertex 
outside  this  subset  is  adjacent  to  a  vertex  in  the  dominating  set.  We  note  that  there  is  a 
serial  algorithm  for  finding  minimum  dominating  sets  on  trees;  this  algorithm  is  similar  to 
the  serial  algorithm  for  the  MVC  problem.  It  is  straightforward  to  apply  the  ACD  method 
to  find  the  minimum  dominating  set  for  a  tree.  We  also  note  that  a  very  recent  paper  [H- 
86],  suggested  applying  the  Miller-Reif  method  to  several  more  problems  on  trees.  The 
ACD  method  is  applicable  to  all  these  problems,  as  well. 

4.  Apparently  there  are  "leaf-to-root"  serial  algorithms  for  which  the  SHORTCUT  opera- 
tion cannot  be  adapted.  For  instance,  consider  the  simple  serial  algorithm  for  the  following 
tree  partitioning  problem.  The  algorithm  is  given  in  [Me-83].  Given  a  tree  T,  where  a  non- 
negative  weight  is  associated  with  each  vertex,  and  given  a  nonnegative  number  X.,  the 
problem  is  to  delete  the  maximal  number  of  edges  of  T  so  that  the  weight  of  each  resulting 
subtree  is  at  least  \.  We  note  that  we  run  into  similar  difficulties  in  trying  to  adapt  the 
COMPRESS  operation  of  [MR-85]  to  the  serial  algorithm  for  this  problem. 
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